-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bin_df_cols
leave input df unchanged
#192
Conversation
@@ -21,7 +21,7 @@ def count_elements( | |||
values: ElemValues, | |||
count_mode: ElemCountMode = ElemCountMode.composition, | |||
exclude_elements: Sequence[str] = (), | |||
fill_value: float | None = 0, | |||
fill_value: float | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janosh This seem to cause the behaviour change in #215, is this an accidental change? I think the original default value of 0
is correct here? i.e. using 0
for element this is missing?
pymatviz/pymatviz/process_data.py
Line 55 in 82c087b
fill_value (float | None): Value to fill in for missing elements. Defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into this! 🙏 i think returning None
for missing elements in count_elements
is more correct/semantic than 0. so ideally, ptable_heatmap_ratio
should be updated to work with the new behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think returning None for missing elements in count_elements is more correct/semantic than 0
I guess both choices hold merit from certain standpoint, i.e. None
is more semantic to suggest that particular element is missing, 0
is also valid as its count is indeed zero.
For the particular function, however, I slight prefer using zero because it seems to be designed to "count element occurrence":
pymatviz/pymatviz/process_data.py
Lines 26 to 28 in 82c087b
"""Count element occurrence in list of formula strings or dict-like compositions. | |
If passed values are already a map from element symbol to counts, ensure the | |
data is a pd.Series filled with zero values for missing element symbols. |
So its occurrence would be zero if that particular element doesn't appear in the data.
ideally, ptable_heatmap_ratio should be updated to work with the new behavior
I'm afraid this would have broader impact as there're quite a few usage that need to be checked:
For example:
from pymatviz import count_elements
from pymatviz.enums import ElemCountMode, Key
from matminer.datasets import load_dataset
dataset = load_dataset("matbench_expt_gap")[Key.composition]
counted_dataset = count_elements(dataset, ElemCountMode.composition, fill_value=0)
print(counted_dataset)
With fill value zero:
symbol
H 235.00
He 0.00
Li 967.66
Be 121.00
B 1168.00
...
Fl 0.00
Mc 0.00
Lv 0.00
Ts 0.00
Og 0.00
With None
:
symbol
H 235.00
He NaN
Li 967.66
Be 121.00
B 1168.00
...
Fl NaN
Mc NaN
Lv NaN
Ts NaN
Og NaN
Using None
seem to require an additional process step .
So do we want to change this completely or revert this?
No description provided.